In this document we describe with simple example and model how neural networks work, in order to understand the technology that is enabling Artificial Intelligence implementation.
We will build a Neural Network capable of classifying the Iris flower according to the three species Setosa, Versicolor and Virginica. We selected Iris species as it is commonly used to test classification algorithm.
As we want to focus on Sinmple Neural Network to keep it easy and understandable, we will base our classification on length and width of Petal and Sepal, according to the Iris Data set - see attribute information box -. It is in fact known that one can classify the Iris type based on those “features”. In contrast we will not develop an image recognition Convolutional Neural Network as we want to make the Neural Net simple and understandable.
“FIGURE 1 - IRIS”
So our input is the Iris Data set summarized here.
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
We will use a standard LEARN and TEST approach, like at school: you have exercises with solutions you go through to learn and verify you are learning, and then you have the test phase where you have the exercise and the teacher will verify your result. This approach is used with Neural Netowrk as well and it is called Supervised Learning. The learning phase is also called training phase.
For the training phase we will use a portion of the Iris Data set and the remaining part of the data set will be used for the testing phase. Let’s consider 66% of the data for training and 33% of the data for testing. As the entire data set consists of 150 rows, we will use 100 rows as exercises to train like in homework and 50 rows as exercises to test like in class tests.
So our “student” that is the Neural Network - remember that it doesn’t know anything about Iris Species - will be trained with 100 exercises, During this phase it will learn, just by “looking” at the example, you don’t need to explain anything. After this phase we will test it with the reamaining 50 exercises, but this time we will provide to the Neural Network only the Petal, Sepal, length, width and will ask to classify the Iris. We will then check the answers against the solution that we have.
# set.seed(10)
# iris[sample(1:150,10),]
During this phase it will learn, just by “looking” at the example, you don’t need to explain anything. After this phase we will test it with the reamaining 50 exercises, but this time we will provide to the Neural Network only the Petal, Sepal, length, width and will ask to classify the Iris. We will then check the answers against the solution that we have. So the test will be performed on 50 exercises. See the following picture:
“FIGURE 2 - TRAINING AND TESTING PHASES”
# set.seed(150)
# Species = rep("?",10)
# cbind(iris[sample(1:150,10),1:4],Species)
So after the training we will ask the Neural Network to replace the question mark with the proper answer: Setosa, Versicolor, Virginica.
Doesn’t sound a bit magic?
That’s the magic of learning by Neural Network. Think about the magic of a child learning a language just by listening, and sometimes getting the parents correcting him. Neural Network works in a similar way, as they try to mimic how the brain works.
But now let’s see what’s a Neural Network.
A Neural Network consists of neural units connected among them and connected with the world through input and output. So basically Neural Network has one Input layer, one outpur layer and several hidden layers. In the case all neural units are connected to all neural units of the following layer. Such kind of Neural Network is a Feed Forward Neural Network, in contrast to convolutional and recursive Neural Network that have a different architecture. Here we will focus on Feed Forward Neural Network.Each neural unit receives several numerical input, weight each input and sum the input, before sending to the output the numerical value is processed by a non linear function that is called activation function as showed in the following picture:
“Neural Network Architecture”
So the Neural Network is basically defined by:
As first step we have to create the Neural Network capable of classifing the Iris species. We will be using a standard method based on supervised learning, exactly using 66% of the Iris Data Set above, the 33% Data Set will be use to Validate the Neural Network.
So basicall the Neural Network will learn from examples and than will be evaluated with a classification test.
THe study is based on the R package nnet, that is a simple package that support only one hidden layer but it is powerful enough to train good models and for tutorial reasons.
Let’s split the Iris data in trainig set to learn and testing set to test, according to 66% - 33% ratio.
set.seed(180)
i <- sample(1:150,100)
iris_train <- iris[i,]
iris_test <- iris[-i,]
summary(iris_train)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.10 setosa :36
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.500 1st Qu.:0.20 versicolor:37
Median :5.700 Median :3.000 Median :4.100 Median :1.30 virginica :27
Mean :5.811 Mean :3.084 Mean :3.607 Mean :1.11
3rd Qu.:6.325 3rd Qu.:3.400 3rd Qu.:5.025 3rd Qu.:1.80
Max. :7.900 Max. :4.400 Max. :6.700 Max. :2.50
Now we need 4 neural nodes in input fed by the Sepal and Petal length and width, and 3 neural nodes in output to specify the probability of each species. Let’s consider only one neural node in the hidden layer (only one hidden layer is considered here for simplicity), the activation function we consider is the sigmoid and then the resulting Neural Network looks as follow:
library(devtools)
source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
SHA-1 hash of file is 74c80bd5ddbc17ab3ae5ece9c0ed9beb612e87ef
library(nnet)
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1, maxit = 0)
# weights: 11
plot.nnet(irnn)
Loading required package: scales
Loading required package: reshape
So it basically model creation at this point consists of “tuning” the 11 weights and the bias value. This is obatain in an iterative way, starting with random numbers and then changing them with a method called backpropagation in order to minimize the output error, so to minimize the misclassification. The iris_train data is used several time to obtain the convergence. Let’s have a look at how such network works before “learning”, so before the weights are tuned. It means basically that we are using random numbers.
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
irpred
setosa
setosa 36
versicolor 37
virginica 27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 0.36
Unsurprisingly the accuracy is 36%, that is basically random choice of the Iris type. Now lets apply one cycle learning and see if it improves, so if it learns.
library(nnet)
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1, maxit = 1)
# weights: 11
initial value 111.716872
final value 111.000802
stopped after 2 iterations
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
irpred
setosa
setosa 36
versicolor 37
virginica 27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 0.36
Well it seems that the first learning cycle didn’t improve, so let the Neural Network play and play again with the data set as Romans said “Repetita Iuvant”: repeating things help! Let’s apply the default nnet package 100 repetition.
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 1)
# weights: 11
initial value 111.716872
iter 10 value 48.716977
iter 20 value 43.578253
final value 43.577037
converged
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
irpred
setosa versicolor
setosa 36 0
versicolor 0 37
virginica 0 27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 0.73
Good the Neural Network improved a lot! Now it classifies the Iris species with 73% accuracy! At least in the iris_train. But now let’s see if it can understand Species from a new data set, never seen before, not used for traning, the iris_test.
irpred <- predict(irnn, iris_test, type = "class")
table(iris_test$Species, irpred)
irpred
setosa versicolor
setosa 14 0
versicolor 0 13
virginica 0 23
sum(iris_test$Species == irpred) / length(iris_test$Species)
[1] 0.54
Good! We are 54% with only one neural node. Let’s have a look at it, note how the weights changed as you can see with the size of the links between neural nodes.
plot.nnet(irnn)
Now let’s try with a more complex Neural Network, in this case we increase the number of Neural Nodes in the hidden layer to 3
set.seed(18)
irnn <- nnet(Species ~ ., data = iris_train, size = 3)
# weights: 27
initial value 124.056864
iter 10 value 27.920209
iter 20 value 0.210898
iter 30 value 0.002851
final value 0.000071
converged
irpred <- predict(irnn, iris_train, type = "class")
table(iris_train$Species, irpred)
irpred
setosa versicolor virginica
setosa 36 0 0
versicolor 0 37 0
virginica 0 0 27
sum(iris_train$Species == irpred) / length(iris_train$Species)
[1] 1
Very good improvement! We are now at 100% in the training set. This is not surprisingly we imporved, in fact the second models has higher number of neural nodes and connections among them. So basically we evolved the species! Let’s test it against testing set.
irpred <- predict(irnn, iris_test, type = "class")
table(iris_test$Species, irpred)
irpred
setosa versicolor virginica
setosa 14 0 0
versicolor 0 11 2
virginica 0 0 23
sum(iris_test$Species == irpred) / length(iris_test$Species)
[1] 0.96
Doing very well! 96% Accuracy. Let’s have a look at the Neural Net:
plot.nnet(irnn)
We have trained a simple Neural Network based on available data that teaches how to classify the Iris species based on sepal-petal length and width. The training consisted of tuning weigths in the Neural Network by going through the trainig set several time. This process is pretty similar to a human learning process, where the man is trained with a controlled environemnt and he can check his results with actual results.
After the training is completed, meaning that the man is making a limited number of errors, he is ready to face real cases, that is cases where he doesn’t know the answer and he has to apply what he learnt.
This is exaclty what we did here with the simple Neural Net exercize. Now the Neural Net can classify Iris. Note also the effect of more neural nodes in the hidden layer that improved the accuracy from 54% to 96%. Note also that an untrained netowork is providing random guess.